This document is intended for developers who write applications that include speech recognition features by making calls to the Speech Recognition Manager. (The Speech Recognition Manager is implemented in the Speech Recognition extension.)
PlainTalk 1.5 is a collection of software that enables your Macintosh to speak written text and respond to spoken commands. PlainTalk 1.5 includes three components, each with its own installer: English Text-to-Speech, Mexican Spanish Text-to-Speech, and English Speech Recognition. Another document ("About PlainTalk 1.5") describes each of these three software packages, and discusses the differences between PlainTalk 1.5 and earlier versions of PlainTalk.
Version 1.5 of the English Speech Recognition installer installs version 1.5.1 of the Speech Recognition extension. This document describes what's new in version 1.5.1 of the Speech Recognition extension.
For more information about the Speech Recognition extension and the Speech Recognition Manager, please visit Apple's Speech web site at <http://www.speech.apple.com/>. You can download the Speech Recognition Manager Software Developer Kit (SDK) from that web site, which has full documentation for the Speech Recognition Manager.
What's new in version 1.5.1
of the Speech Recognition extension
Version 1.5.1 of the Speech Recognition extension fixes a few bugs and adds or enhances a few features in the Speech Recognition Manager.
• A hang that occasionally occurred if an application created large language models or loaded many language models has been fixed.
• In Speech Recognition version 1.5 SRCancelRecognition () did not function as described in the Speech Recognition Manager documentation. Version 1.5.1 works as documented.
• A crash that could occur when the SRNewLanguageObjectFromDataFile () or SRNewLanguageObjectFromHandle () routines were used to read a language model with multiple embedded sub-language models has been fixed.
• The "listen only while key(s) are pressed" feature (also known as the push-to-talk feature) now is more tolerant of noise or speech that occurs before the push-to-talk key is pressed, requiring little or no pause before pressing the key down and speaking a command. Using the push-to-talk feature increases recognition accuracy and eliminates misfires (when the computer misinterprets sounds not intended for the computer).
• In Speech Recognition version 1.5, if an application disabled its top level language model -- for example, as a quick way to disable recognition as part of a custom push-to-talk mechanism -- the recognizer would still get result notifications, with the notification indicating that nothing was recognized (since the language model is disabled) or that the sound heard was rejected. With this release, if an application disables its top-level language model (the one passed to the SRSetLanguageModel call), then that application will not get any speech recognition notifications while the language model is disabled.
• If an application uses speech recognition but chooses not to use the standard feedback and listening-mode behavior (by setting the SRRecognitionSystem's kSRFeedbackAndListeningModes property to kSRNoFeedbackNoListenModes), and then that application blocks other speech recognition applications from listening (by setting the Recognizer's kSRBlockModally property to true) then the push-to-talk key (typically the 'esc' key) will not be "eaten". That is, the application will get notified when the user types that key, just as they are when the user types any other key. Before this release, the push-to-talk key was always eaten. This was a problem for fast action games that use speech recognition (but not the standard listening modes) which want to use that key for other purposes.
• The sound-input channel is freed now when an application blocks modally (by setting the Recognizer's kSRBlockModally property to true) and calls SRStopListening (). This is useful, for example, if a phone-dialer application wants to allow users to initiate phone calls by speaking a phrase like "Call my mother", and then (after the call is established) wants to use the microphone as part of a speaker-phone system. Before this release, if another speech recognition application (like the Speakable Items utility) were running, it could "own" sound input and the microphone, preventing the phone-dialer application from using the microphone for a speaker phone. Now, the phone-dialer application can (for the duration of the phone call) block Speakable Items and other applications and free the microphone by calling SRStopListening.
• When an application uses the standard feedback behavior (by setting the SRRecognitionSystem's kSRFeedbackAndListeningModes property to kSRNoFeedbackNoListenModes) then the system plays a short feedback sound ('Single Click' by default) after each successful recognition. With this new release, it will do that only if another utterance is not already in progress. (In version 1.5, the sound could interfere with the recognition of a subsequent utterance.)